Incorporating Prior Knowledge into Task Decomposition for Large-Scale Patent Classification

نویسندگان

  • Chao Ma
  • Bao-Liang Lu
  • Masao Utiyama
چکیده

With the adoption of min-max-modular support vector machines (SVMs) to solve large-scale patent classification problems, a novel, simple method for incorporating prior knowledge into task decomposition is proposed and investigated. Two kinds of prior knowledge described in patent texts are considered: time information, and hierarchical structure information. Through experiments using the NTCIR-5 Japanese patent database, patents are found to have time-varying features that considerably affect classification. The experimental results demonstrate that applying min-max modular SVMs with the proposed method gives performance superior to that of conventional SVMs in terms of training time, generalization accuracy, and scalability.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Áòòóöôóööøøòò Ôööóö Òóûððððð Òøó Ðððöòòòò Ý Úúúúòò Øöööòòòò Øø ¾ Ååò¹ññü Ñóóùððö Òòøûóöö

In most large-scale real-world pattern classification problems, there is always some explicit information besides given training data, namely prior knowledge, with which the training data are organized. In this paper, we proposed a framework for incorporating this kind of prior knowledge into the training of min-max modular (M) classifier to improve learning performance. In order to evaluate th...

متن کامل

Learning from imbalanced data sets with a Min-Max modular support vector machine

Imbalanced data sets have significantly unequal distributions between classes. This between-class imbalance causes conventional classification methods to favor majority classes, resulting in very low or even no detection of minority classes. A Min-Max modular support vector machine (M-SVM) approaches this problem by decomposing the training input sets of the majority classes into subsets of sim...

متن کامل

Gender Classification Using a Min-Max Modular Support Vector Machine with Incorporating Prior Knowledge

Gender classification based on facial images is a large-scale, complicated two-class classification problem by nature. The reason is that few knowledge is known about the mechanism of human beings discriminating male and female from facial images, and a large number of various facial images is required to train the gender classifier. This paper presents a method for dealing with the gender clas...

متن کامل

Incorporating Linguistic Knowledge for Learning Distributed Word Representations

Combined with neural language models, distributed word representations achieve significant advantages in computational linguistics and text mining. Most existing models estimate distributed word vectors from large-scale data in an unsupervised fashion, which, however, do not take rich linguistic knowledge into consideration. Linguistic knowledge can be represented as either link-based knowledge...

متن کامل

A Parallel and Modular Pattern Classification Framework for Large-Scale Problems

The number of samples that are available on the internet to train pattern classifiers is increasing rapidly, while traditional pattern classification techniques based on a single computer system are powerless to process these large-scale data sets. This chapter presents a parallel and modular pattern classification framework for coping with large-scale pattern classification problems. The propo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009